
********************************************************************************************************************************************
*This .do file takes EIA Power Plant Data and school location data from NCES as inputs and outputs results for Panel A of Table 2 and Panel B of Table A.5.

*Inputs: 
*1. EIA 923 forms. Files in \EIA with prefix  "EIA923" for 2011 on, or "f906920_2004/f906920_2005" for 2004-2005. Data available at: https://www.eia.gov/electricity/data/eia923/
*2a. Plant addresses from EIA Form 860. Files in \EIA and called "2___Plant_Y2018", "2___Plant_Y2019" and "PlantY2012".  Data available at: https://www.eia.gov/electricity/data/eia860/
*2b. Plant list from EIA form 860 in 2005: "EIA Data\PlantY05.xls".  Data available at: https://www.eia.gov/electricity/data/eia860/
*2c. "EIA Data\Addresses_2007.dta" is simply the STATA data format of EIA form 860 in 2007: "EIA Data\PlantY07.xls" with only mailing address information kept
*2d. "EIA Data\2007_GeocodedAddress.dta" is the geocoded addresses of "EIA Data\Addresses_2007.dta". Geocoding was done using the "geocodehere" STATA command which uses the HERE Maps API.
*3. "Schools Data\schoolsdata.dta": School enrollments and school locations from the NCES for school years 2004-05, 2011-12, and 2018-19.  These data were created using the NCES Table generator available at https://nces.ed.gov/ccd/elsi/tablegenerator.aspx.
*3b. "Schools Data\2006_schoolsdata.dta": Same as 3a, but only for year 2006 (just used as missing some demographics for some schools in the 2005 NCES data). These data were created using the NCES Table generator available at https://nces.ed.gov/ccd/elsi/tablegenerator.aspx.

*Outputs the following results:
*1. Panel A of Table 2
*2. Panel B of Table A.5
********************************************************************************************************************************************

clear all
set more off
cd "C:\Users\gilraine\Dropbox\PM25_test_draft\R&R\ReplicationPackage\Derived Data\Table 2 Data"

*Looping over years. Note 2005 is a bit "special" as I need to link the lat/lons and student demos to some other files; hence the separate code base for that year. Only diff for 2012 and 2019 is that in 2019 can get lat/lons for prior year which can't for 2012 (as no lat/lons pre-2012)

foreach year of numlist 2005 2012 2019{
clear all
display "Year is `year'"
local m=`year'-1
if `year'==2005{
*First get lats and longs from EIA form 860. This form does not exist pre-2012, so start by assign lat/lons according to 2012 lat/lons.  Then, for unmerged get their (already) geocoded address via the 2007 file (no addresses before that)

*First get 2012 lat/lons from EIA 860
import excel "EIA Data\PlantY2012.xlsx", sheet("plant2012") cellrange(A2:AB7291) firstrow
keep PlantCode Latitude Longitude
qui destring Latitude Longitude, replace
qui drop if Latitude==.
ren PlantCode PlantId
qui compress
qui save "LatLons2012.dta", replace
*Merge these onto the 2005 plants
clear all
import excel "EIA Data\PlantY05.xls", sheet("PlantY05") firstrow
ren PLNTCODE PlantId
qui merge 1:1 PlantId using "LatLons2012.dta"
qui drop if _merge==2
drop _merge
*If not merge, merge on using the 2007 geocoded addresses
qui merge 1:1 PlantId using "EIA Data\2007_GeocodedAddress.dta"
qui drop if _merge==2
drop _merge
qui replace Latitude=lat if Latitude==.
qui replace Longitude=lon if Longitude==.
qui compress
qui save "EIA Data\LatLons2005.dta", replace
erase "LatLons2012.dta"

*Now get pollution data from 2005 EIA Form 906 (predecessor of EIA 923)
clear all
import excel "EIA Data\f906920_2005.xls", sheet("Page 1 Generation and Fuel Data") cellrange(A8:CS9351) firstrow
*Using the fuel codes to define fuel type (? OTH)
qui gen coal=(AERFuelTypeCode=="COL" | AERFuelTypeCode=="PC" | AERFuelTypeCode=="WOC" | AERFuelTypeCode=="WWW")
qui gen gas=(AERFuelTypeCode=="NG" | AERFuelTypeCode=="OOG" | AERFuelTypeCode=="MLG")
qui gen oil=(AERFuelTypeCode=="DFO" | AERFuelTypeCode=="WOO" | AERFuelTypeCode=="RFO")
qui gen renewable=(AERFuelTypeCode=="SUN" | AERFuelTypeCode=="GEO" | AERFuelTypeCode=="HPS" | AERFuelTypeCode=="HYC" | AERFuelTypeCode=="NUC" | AERFuelTypeCode=="ORW" | AERFuelTypeCode=="WND")
qui drop if AERFuelTypeCode=="OTH"
qui destring NETGEN_JAN NETGEN_FEB NETGEN_MAR NETGEN_APR NETGEN_MAY, replace
foreach var of varlist NETGEN_JAN NETGEN_FEB NETGEN_MAR NETGEN_APR NETGEN_MAY{
qui replace `var'=0 if `var'==.
qui replace `var'=0 if `var'<0
}
qui gen Netgen=NETGEN_JAN + NETGEN_FEB + NETGEN_MAR + NETGEN_APR + NETGEN_MAY
ren PlantID PlantId
keep PlantId coal gas oil renewable Netgen
qui merge m:1 PlantId using "EIA Data\LatLons2005.dta"
qui tab _merge if PlantId==99999
qui drop if _merge==2
qui drop if PlantId==99999
drop _merge

*Only 2 above 50K missing Lats
qui su Netgen if Latitude==.
*Assigning to Zip code geolocation lat/lon for over 50k
qui replace Latitude=29.408248 if PLNTNAME=="S&L Cogeneration"
qui replace Longitude=-94.943959 if PLNTNAME=="S&L Cogeneration"
qui replace Latitude=40.827449 if PLNTNAME=="Roche Vitamins"
qui replace Longitude=-75.077307 if PLNTNAME=="Roche Vitamins"
qui drop if Latitude==.
qui drop if Longitude==.
gen year=2005
keep PlantId coal gas oil renewable Netgen Latitude Longitude
qui compress
qui save "2005.dta", replace

*Now need to do same thing for 2004 (as 2004-05 school year covers both calendar years)
*2004 EIA Form 906 (predecessor of EIA 923)
clear all
import excel "EIA Data\f906920_2004.xls", sheet("Page 1 Generation and Fuel Data") cellrange(A8:CS9510) firstrow
*Using the fuel codes to define fuel type (? OTH)
qui gen coal=(AERFuelTypeCode=="COL" | AERFuelTypeCode=="PC" | AERFuelTypeCode=="WOC" | AERFuelTypeCode=="WWW")
qui gen gas=(AERFuelTypeCode=="NG" | AERFuelTypeCode=="OOG" | AERFuelTypeCode=="MLG")
qui gen oil=(AERFuelTypeCode=="DFO" | AERFuelTypeCode=="WOO" | AERFuelTypeCode=="RFO")
qui gen renewable=(AERFuelTypeCode=="SUN" | AERFuelTypeCode=="GEO" | AERFuelTypeCode=="HPS" | AERFuelTypeCode=="HYC" | AERFuelTypeCode=="NUC" | AERFuelTypeCode=="ORW" | AERFuelTypeCode=="WND")
qui drop if AERFuelTypeCode=="OTH"
qui destring NETGEN_SEP NETGEN_OCT NETGEN_NOV NETGEN_DEC, replace
foreach var of varlist NETGEN_SEP NETGEN_OCT NETGEN_NOV NETGEN_DEC{
qui replace `var'=0 if `var'==.
qui replace `var'=0 if `var'<0
}
qui gen Netgen=NETGEN_SEP + NETGEN_OCT + NETGEN_NOV + NETGEN_DEC
ren PlantID PlantId
keep PlantId coal gas oil renewable Netgen
qui merge m:1 PlantId using "EIA Data\LatLons2005.dta"
qui tab _merge if PlantId==99999
qui drop if _merge==2
qui drop if PlantId==99999
qui drop _merge
*Only 2 above 50K missing Lats
qui su Netgen if Latitude==.
*Assigning to Zip code lat/lon for over 50k
qui replace Latitude=29.408248 if PLNTNAME=="S&L Cogeneration"
qui replace Longitude=-94.943959 if PLNTNAME=="S&L Cogeneration"
qui replace Latitude=40.827449 if PLNTNAME=="Roche Vitamins"
qui replace Longitude=-75.077307 if PLNTNAME=="Roche Vitamins"
qui replace Latitude=30.44942605868427 if PLNTNAME=="ExxonMobil Baton Rouge Cogen"
qui replace Longitude=-91.17935626981674 if PLNTNAME=="ExxonMobil Baton Rouge Cogen"
qui drop if Latitude==.
qui drop if Longitude==.
qui gen year=2004
keep PlantId coal gas oil renewable Netgen Latitude Longitude
qui compress
qui save "2004.dta", replace

*Combine 2004/05 data
clear all
use "2005.dta"
append using "2004.dta"
erase "2005.dta"
erase "2004.dta"
gen id=_n
qui save "2004_05.dta", replace

*Now get the school data; will use geonear to find all plants within 40km of school
clear all
use "Schools Data\schoolsdata.dta" 
*Keep only the respective academic year
qui keep if year==2005
*Missing race for some schools (like 2%) for 2005. Lets just impute that based on % of race in next year's data
qui merge 1:1 ncesid using "Schools Data\2006_schoolsdata.dta"
qui drop if _merge==2
qui replace black=pct_black*enroll if black==.
qui replace white=pct_white*enroll if white==.
qui drop _merge pct_black pct_white
preserve
keep ncesid agencyid
qui save "SUPERTEMP.dta", replace
restore

*Find all plants within 40km
geonear ncesid lat lon using "2004_05.dta", neighbors(id Latitude Longitude) long within(40) ellipsoid 
qui merge m:1 id using "2004_05.dta"
qui qui keep if _merge==3
drop _merge
*Get agencyid
qui merge m:1 ncesid using "SUPERTEMP.dta"
erase "SUPERTEMP.dta"
drop _merge
order ncesid agencyid
qui gen coal_prod=Netgen if coal==1
qui gen gas_prod=Netgen if gas==1
qui gen oil_prod=Netgen if oil==1
qui gen renew_prod=Netgen if renewable==1
*Collapse total prod at the school level
collapse (sum) coal_prod gas_prod oil_prod renew_prod, by(ncesid agencyid)
qui save "schoolsdata_temp.dta", replace
erase "EIA Data\LatLons2005.dta"
}
else if `year'>=2012{
clear all
*First get lats and longs from EIA form 860
if `year'==2012{
import excel "EIA Data\PlantY2012.xlsx", sheet("plant2012") cellrange(A2:AB7291) firstrow
}
else if `year'==2019{
import excel "EIA Data\2___Plant_Y2019.xlsx", sheet("Plant") firstrow
}
keep PlantCode Latitude Longitude
qui destring Latitude Longitude, replace
qui drop if Latitude==.
ren PlantCode PlantId
qui compress
qui save "LatLons`year'.dta", replace
*EIA Form 923
clear all
if `year'==2012{
import excel "EIA Data\EIA923_Schedules_2_3_4_5_M_12_2012_Final_Revision.xlsx", sheet("Page 1 Generation and Fuel Data") cellrange(A6:CS11010) firstrow
}
else if `year'==2019{
import excel "EIA Data\EIA923_Schedules_2_3_4_5_M_12_2019_Final_Revision.xlsx", sheet("Page 1 Generation and Fuel Data") cellrange(A6:CS14523) firstrow
}
*Using the fuel codes to define fuel type (? OTH)
qui gen coal=(AERFuelTypeCode=="COL" | AERFuelTypeCode=="PC" | AERFuelTypeCode=="WOC" | AERFuelTypeCode=="WWW")
qui gen gas=(AERFuelTypeCode=="NG" | AERFuelTypeCode=="OOG" | AERFuelTypeCode=="MLG")
qui gen oil=(AERFuelTypeCode=="DFO" | AERFuelTypeCode=="WOO" | AERFuelTypeCode=="RFO")
qui gen renewable=(AERFuelTypeCode=="SUN" | AERFuelTypeCode=="GEO" | AERFuelTypeCode=="HPS" | AERFuelTypeCode=="HYC" | AERFuelTypeCode=="NUC" | AERFuelTypeCode=="ORW" | AERFuelTypeCode=="WND")
qui drop if AERFuelTypeCode=="OTH"
qui destring NetgenJanuary NetgenFebruary NetgenMarch NetgenApril NetgenMay, replace
foreach var of varlist NetgenJanuary NetgenFebruary NetgenMarch NetgenApril NetgenMay{
qui replace `var'=0 if `var'==.
qui replace `var'=0 if `var'<0
}
qui gen Netgen=NetgenJanuary + NetgenFebruary + NetgenMarch + NetgenApril + NetgenMay
qui keep PlantId coal gas oil renewable Netgen
qui merge m:1 PlantId using "LatLons`year'.dta"
qui tab _merge if PlantId==99999
qui keep if _merge==3
drop _merge
gen year=`year'
qui compress
qui save "`year'.dta", replace

*Prior year
clear all
if `year'==2012{
import excel "EIA Data\EIA923_Schedules_2_3_4_5_2011_Final_Revision.xlsx", sheet("Page 1 Generation and Fuel Data") cellrange(A6:CS10467) firstrow
*Using the fuel codes to define fuel type (? OTH)
qui gen coal=(AERFuelTypeCode=="COL" | AERFuelTypeCode=="PC" | AERFuelTypeCode=="WOC" | AERFuelTypeCode=="WWW")
qui gen gas=(AERFuelTypeCode=="NG" | AERFuelTypeCode=="OOG" | AERFuelTypeCode=="MLG")
qui gen oil=(AERFuelTypeCode=="DFO" | AERFuelTypeCode=="WOO" | AERFuelTypeCode=="RFO")
qui gen renewable=(AERFuelTypeCode=="SUN" | AERFuelTypeCode=="GEO" | AERFuelTypeCode=="HPS" | AERFuelTypeCode=="HYC" | AERFuelTypeCode=="NUC" | AERFuelTypeCode=="ORW" | AERFuelTypeCode=="WND")
qui drop if AERFuelTypeCode=="OTH"
qui destring Netgen_Sep Netgen_Oct Netgen_Nov Netgen_Dec, replace
foreach var of varlist Netgen_Sep Netgen_Oct Netgen_Nov Netgen_Dec {
qui replace `var'=0 if `var'==.
qui replace `var'=0 if `var'<0
}
qui gen Netgen=Netgen_Sep + Netgen_Oct + Netgen_Nov + Netgen_Dec
keep PlantId coal gas oil renewable Netgen
qui merge m:1 PlantId using "LatLons`year'.dta"
erase "LatLons`year'.dta"
qui tab _merge if PlantId==99999
qui keep if _merge==3
drop _merge
qui gen year=`m'
qui compress
qui save "`m'.dta", replace
}
else if `year'==2019{
*First get lats and longs from EIA form EIA 860 (couldn't do this for 2012 as 2012 is first year with lat and lons)
import excel "EIA Data\2___Plant_Y2018.xlsx", sheet("Plant") cellrange(A2:AP10982) firstrow
keep PlantCode Latitude Longitude
qui destring Latitude Longitude, replace
qui drop if Latitude==.
ren PlantCode PlantId
qui compress
qui save "LatLons2018.dta", replace
*2018 EIA Form 923
clear all
import excel "EIA Data\EIA923_Schedules_2_3_4_5_M_12_2018_Final_Revision.xlsx", sheet("Page 1 Generation and Fuel Data") cellrange(A6:CS13965) firstrow
*Using the fuel codes to define fuel type (? OTH)
qui gen coal=(AERFuelTypeCode=="COL" | AERFuelTypeCode=="PC" | AERFuelTypeCode=="WOC" | AERFuelTypeCode=="WWW")
qui gen gas=(AERFuelTypeCode=="NG" | AERFuelTypeCode=="OOG" | AERFuelTypeCode=="MLG")
qui gen oil=(AERFuelTypeCode=="DFO" | AERFuelTypeCode=="WOO" | AERFuelTypeCode=="RFO")
qui gen renewable=(AERFuelTypeCode=="SUN" | AERFuelTypeCode=="GEO" | AERFuelTypeCode=="HPS" | AERFuelTypeCode=="HYC" | AERFuelTypeCode=="NUC" | AERFuelTypeCode=="ORW" | AERFuelTypeCode=="WND")
qui drop if AERFuelTypeCode=="OTH"
qui destring NetgenSeptember NetgenOctober NetgenNovember NetgenDecember, replace
foreach var of varlist NetgenSeptember NetgenOctober NetgenNovember NetgenDecember {
qui replace `var'=0 if `var'==.
qui replace `var'=0 if `var'<0
}
qui gen Netgen=NetgenSeptember + NetgenOctober + NetgenNovember + NetgenDecember
keep PlantId coal gas oil renewable Netgen
qui merge m:1 PlantId using "LatLons2018.dta"
erase "LatLons2018.dta"
qui tab _merge if PlantId==99999
qui keep if _merge==3
drop _merge
gen year=2018
qui compress
qui save "2018.dta", replace
} 

clear all
use "`year'.dta"
append using "`m'.dta"
erase "`year'.dta"
erase "`m'.dta"
gen id=_n
qui save "`m'_`year'.dta", replace

clear all
use "Schools Data\schoolsdata.dta" 
qui keep if year==`year'
geonear ncesid lat lon using "`m'_`year'.dta", neighbors(id Latitude Longitude) long within(40) ellipsoid 
qui merge m:1 id using "`m'_`year'.dta"
qui keep if _merge==3
qui drop _merge
qui gen coal_prod=Netgen if coal==1
qui gen gas_prod=Netgen if gas==1
qui gen oil_prod=Netgen if oil==1
qui gen renew_prod=Netgen if renewable==1

collapse (sum) coal_prod gas_prod oil_prod renew_prod, by(ncesid)
qui save "schoolsdata_temp.dta", replace
}

*Same code for all years
clear
use "Schools Data\schoolsdata.dta" 
qui keep if year==`year'
qui merge 1:1 ncesid using "schoolsdata_temp.dta"
erase "schoolsdata_temp.dta"
qui su coal_prod [aw=enroll]
local coal=r(mean)
qui su oil_prod [aw=enroll]
local oil=r(mean)
di "Coal + Oil Exposure"
di `coal'+`oil'
di "Gas Exposure"
qui su gas_prod [aw=enroll]
di r(mean)
qui su renew_prod [aw=enroll]
di "Renewable Exposure"
di r(mean)
*Now black-white gaps
qui su coal_prod [aw=black]
local coal_b=r(mean)
qui su oil_prod [aw=black]
local oil_b=r(mean)
qui su coal_prod [aw=white]
local coal_w=r(mean)
qui su oil_prod [aw=white]
local oil_w=r(mean)
di "BW Coal Oil Gap"
di `coal_b'+`oil_b' - (`coal_w'+`oil_w')
di "BW Gas Gap"
qui su gas_prod [aw=black]
local gas_b=r(mean)
qui su gas_prod [aw=white]
local gas_w=r(mean)
di `gas_b'-`gas_w'
di "BW Renew Gap"
qui su renew_prod [aw=black]
local renew_b=r(mean)
qui su renew_prod [aw=white]
local renew_w=r(mean)
di `renew_b'-`renew_w'
qui save "CRAZYTEMP.dta", replace
***************************************************************************************************************
********Appendix Table A.5: CALCULATING GAPS IN COAL EXPOSURE COMING FROM WITHIN VS. ACROSS DISTRICTS**********
***************************************************************************************************************
clear
use "CRAZYTEMP.dta"
*Total, Within, and across gaps, doing it one at a time for each fuel
foreach var of varlist coal_prod oil_prod gas_prod renew_prod{
clear
use "CRAZYTEMP.dta"
*Total Gap for blacks and whites:
qui su `var' [aw=black]
local total_black=r(mean)
qui su `var' [aw=white]
local total_white=r(mean)
*Across district gap. Idea: Assign all students within a district the same coal production. Then calculate the black-white exposure gap
preserve
qui gen total=black+white
collapse (rawsum) enroll black white (mean) `var' [aw=total], by(agencyid)
qui su `var' [aw=black]
local across_black=r(mean)
qui su `var' [aw=white]
local across_white=r(mean)
restore
*Within district gaps. Idea: For each district, find the mean black and white coal exposure in the district. Then find the difference between this 
*exposure for each district.  That is the within-district exposure gap.  Note that Total Gap != Across Gap + Within Gap due to weightings (it is actually a mix
*of the total weight (as that is how find "total" and the BW gap as that only includes B+W students. If you weight "total" using just black+white students then the math "works"
*Given this, I will just set the within as total-across
qui expand 2, gen(demo)
qui gen demo_enroll=black if demo==0
qui gen black`var'=`var' if demo==0
qui replace demo_enroll=white if demo==1
qui gen white`var'=`var' if demo==1
collapse (rawsum) black white enroll (mean) black`var' white`var' [aw=demo_enroll], by(agencyid)
qui gen total=black+white
qui gen black_white_gap_within=black`var'-white`var'
*Within Gap
*Weighting by B+W enroll
qui su black_white_gap_within [aw=total]
*Weighting by Total enroll
qui su black_white_gap_within [aw=enroll]
*Total Gap
di "BW Exposure Gap for `var'"
di `total_black'-`total_white'
*Across Gap
di "BW Across Exposure Gap for `var'"
di `across_black'-`across_white'
di "BW Within Exposure Gap for `var'"
*Within Gap
di `total_black'-`total_white'-(`across_black'-`across_white')
}
erase "CRAZYTEMP.dta"
}



